Bandit Learning with Positive Externalities

نویسندگان

  • Virag Shah
  • Jose Blanchet
  • Ramesh Johari
چکیده

Many platforms are characterized by the fact that future user arrivals are likely to have preferences similar to users who were satisfied in the past. In other words, arrivals exhibit {\em positive externalities}. We study multiarmed bandit (MAB) problems with positive externalities. Our model has a finite number of arms and users are distinguished by the arm(s) they prefer. We model positive externalities by assuming that the preferred arms of future arrivals are self-reinforcing based on the experiences of past users. We show that classical algorithms such as UCB which are optimal in the classical MAB setting may even exhibit linear regret in the context of positive externalities. We provide an algorithm which achieves optimal regret and show that such optimal regret exhibits substantially different structure from that observed in the standard MAB setting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Strategic Experimentation with Heterogeneous Agents and Payoff Externalities

In this paper, I examine the effect of introducing heterogeneity between players in models of strategic experimentation. I consider a two-armed bandit problem with one safe arm and a risky arm. There are two players and each has an access to such a bandit. A player using the safe arm experiences a safe flow payoff . The risky arm can either be good or bad. A bad risky arm is worse than the safe...

متن کامل

Machine Learning Approaches for Interactive Verification

Interactive verification is a new problem, which is closely related to active learning, but aims to query as many positive instances as possible within some limited query budget. We point out the similarity between interactive verification and another machine learning problem called contextual bandit. The similarity allows us to design interactive verification approaches from existing contextua...

متن کامل

Optimal Adaptive Learning in Uncontrolled Restless Bandit Problems

In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems. In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward. There is a player who sequentially selects one of the arms at each time step. The goal of the player is to maximize its undiscounted reward over a time horizo...

متن کامل

Boosting with Online Binary Learners for the Multiclass Bandit Problem

We consider the problem of online multiclass prediction in the bandit setting. Compared with the full-information setting, in which the learner can receive the true label as feedback after making each prediction, the bandit setting assumes that the learner can only know the correctness of the predicted label. Because the bandit setting is more restricted, it is difficult to design good bandit l...

متن کامل

Learning and animal behavior: exploring the dynamics of simple models

Introduction All living organisms must interact with an external environment and should respond to it in a way that maximizes their probability of reproduction and survival. If an organism can learn, it will be able modify its behavior based on environmental feedback and potentially increase its survival probability. The processes underlying learning and behavior are of interest to researchers ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.05693  شماره 

صفحات  -

تاریخ انتشار 2018